InferSpark: Statistical Inference at Scale
نویسندگان
چکیده
The Apache Spark stack has enabled fast large-scale data processing. Despite a rich library of statistical models and inference algorithms, it does not give domain users the ability to develop their own models. The emergence of probabilistic programming languages has showed the promise of developing sophisticated probabilistic models in a succinct and programmatic way. These frameworks have the potential of automatically generating inference algorithms for the user defined models and answering various statistical queries about the model. It is a perfect time to unite these two great directions to produce a programmable big data analysis framework. We thus propose, InferSpark, a probabilistic programming framework on top of Apache Spark. Efficient statistical inference can be easily implemented on this framework and inference process can leverage the distributed main memory processing power of Spark. This framework makes statistical inference on big data possible and speed up the penetration of probabilistic programming into the data engineering domain.
منابع مشابه
ZaliQL: Causal Inference from Observational Data at Scale
Causal inference from observational data is a subject of active research and development in statistics and computer science. Many statistical software packages have been developed for this purpose. However, these toolkits do not scale to large datasets. We propose and demonstrate ZaliQL: a SQL-based framework for drawing causal inference from observational data. ZaliQL supports the state-of-the...
متن کاملZaliQL: A SQL-Based Framework for Drawing Causal Inference from Big Data
Causal inference from observational data is a subject of active research and development in statistics and computer science. Many toolkits have been developed for this purpose that depends on statistical software. However, these toolkits do not scale to large datasets. In this paper we describe a suite of techniques for expressing causal inference tasks from observational data in SQL. This suit...
متن کاملStatistical Inference for the Lomax Distribution under Progressively Type-II Censoring with Binomial Removal
This paper considers parameter estimations in Lomax distribution under progressive type-II censoring with random removals, assuming that the number of units removed at each failure time has a binomial distribution. The maximum likelihood estimators (MLEs) are derived using the expectation-maximization (EM) algorithm. The Bayes estimates of the parameters are obtained using both the squared erro...
متن کاملUsing Probabilistic Views for Large-Scale Statistical Inference
Probabilistic databases extend statistical inference from limited, hand-crafted statistical models to an entire database. Data analysts can discover trends, test hypothesis, and run what-if scenarios by simply running SQL queries. The technical challenge in a probabilistic database is the query processor, which needs to perform a probabilistic inference for every row output by a SQL query: the ...
متن کاملReview of the Applications of Exponential Family in Statistical Inference
In this paper, after introducing exponential family and a history of work done by researchers in the field of statistics, some applications of this family in statistical inference especially in estimation problem,statistical hypothesis testing and statistical information theory concepts will be discussed.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1707.02047 شماره
صفحات -
تاریخ انتشار 2015